Data Integration: Inconsistency Detection and Resolution Based on Source Properties

نویسندگان

  • Philipp Anokhin
  • Amihai Motro
چکیده

This paper addresses the problem of integration of multiple heterogeneous information sources. The sources may conflict with each other on the following three levels: their schema, data representation, or data themselves. Most of the approaches in this area of research resolve inconsistencies among different schemas and data representations, and ignore the possibility of data-level conflict altogether. The few that do acknowledge its existence are mostly probabilistic approaches which just detect the conflict and provide a user with some additional information on the nature of the inconsistency (e.g. give a set of conflicting values with attached probabilities). We propose an extension to the relational data model that makes use of meta-data of the information sources called properties. This extension gives ground to a flexible data integration technique described in this work. The process of data integration for a particular user query consists of construction of the data blocks such that their extended union constitutes the query result, data conflict detection and data conflict resolution. An improvement to data clustering techniques in the conflict detection phase is also presented in the paper. It uses another type of meta-information available from the sources (source descriptions in terms of the virtual database schema) to narrow down the areas of possible data conflicts. For the conflict resolution phase, a flexible algorithm is offered. The algorithm is guided by user-defined importance (or weights) of properties and by expert-defined resolution strategies that incorporate the domain knowledge.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Use of Meta-data for Value-level Inconsistency Detection and Resolution During Data Integration

This paper addresses the data integration problem: there exists a collection of autonomous heterogeneous information sources that need to be integrated; users want to be able to query the collection transparently and to get a single, unambiguous answer. The sources may conflict with each other on three levels: their schemas, data representation, or data themselves. One has to resolve the confli...

متن کامل

Geo-Web Service Tool for Spatial Data Integrability

The integration of multi-source heterogeneous spatial data is one of the major challenges for many spatial data users. Users put much effort to identify and overcome inconsistency among data sets through a timeconsuming and costly process. Spatial applications that rely on multi-source heterogeneous data also suffer from the lack of automatic mechanism to identify the inconsistency items and as...

متن کامل

Integration of Visible Image and LIDAR Altimetric Data for Semi-Automatic Detection and Measuring the Boundari of Features

This paper presents a new method for detecting the features using LiDAR data and visible images. The proposed features detection algorithm has the lowest dependency on region and the type of sensor used for imaging, and about any input LiDAR and image data, including visible bands (red, green and blue) with high spatial resolution, identify features with acceptable accuracy. In the proposed app...

متن کامل

Inconsistency Resolution In The Virtual Database Environment Using Fuzzy Logic

Data integration from different data sources may result in data inconsistencies due to different representation of the same objects at the data source. Many researchers have tried to solve this problem manually or using source features. None of them took the user’s preferences to source features into account. This paper proposes using fuzzy logic with multiple constraints, in accordance with us...

متن کامل

Automatic Interpretation of UltraCam Imagery by Combination of Support Vector Machine and Knowledge-based Systems

With the development of digital sensors, an increasing number of high-resolution images are available. Interpretation of these images is not possible manually, which necessitates seeking for practical, fast and automatic solutions to solve the environmental and location-based management problems. The land cover classification using high-resolution imagery is a difficult process because of the c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001